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The discovery [jl|] of a population of young galaxies at an epoch when the 
universe was about one tenth of its current age has shed new light on the 
question of when and how galaxies formed. Within the context of popular 
models this is the population of primeval galaxies that built themselves up 
to the size of present day galaxies through the process of repeated mergers. 
But the recent detection ||] of a large concentration of these primeval galaxies 
appears to be incompatible with hierarchical clustering models, which gener- 
ally predict that clusters of this size are fully formed later in time. Here we 
use a combination of two powerful theoretical techniques semi-analytic mod- 
elling and N-body simulations to show that such large concentrations should 
be quite common in a universe dominated by cold dark matter, and that they 
are the progenitors of the rich galaxy clusters seen today. We predict the 
£f~^ . clustering properties of primeval galaxies which should, when compared with 

data that will be collected in the near future, test our understanding of galaxy 
formation within the framework of a universe dominated hy cold dark matter. 

The longstanding search for primeval galaxies was recently brought to fruition by 
Q-f the combination of deep imaging and Keck Telescope spectroscopy [|||. Because they were 

originally identified by a sharp break in their spectrum, at the wavelength corresponding 
to the limit of the Lyman series of hydrogen, these galaxies are called "Lyman break" 
galaxies. The most distant examples have redshift z — 3.5, corresponding to the time 
when the universe was only about 10% of its current age. The structure discovered by 
Steidel and collaborators ||] appears as a spike in the redshift distribution at z = 3.09. It 
contains 15 Lyman-break galaxies in a 9 x 18 arcminute field ||. The comoving spatial 
dimensions of this structure are huge, approximately 8 x 10 x 13/i _1 Mpc in a Universe with 
critical density. (Throughout, we denote Hubble's constant as Hq = 100/ikms _1 Mpc _1 .) 
The current paradigm for the formation of cosmic structure is the cold dark matter theory 
[||]. For suitable values of the parameters, this theory provides a good description of large- 
scale structure, from scales probed by microwave background anisotropy measurements 
to those mapped by galaxy surveys ||. In this theory, galaxies form by gas cooling and 
condensing into dark matter halos which grow by accretion and mergers in a hierarchical 
fashion. There is growing observational evidence that young galaxies are indeed built up 
from several small pieces jj]] , S , as predicted by the theory. The technique of semi-analytic 
modelling provides an effective means to describe and quantify the complex physics in- 
volved in galaxy formation and evolution. The processes that need to be considered are 
the shock heating and radiative cooling of gas within collapsing dark matter halos, the 
subsequent transformation of cold gas into stars, the regulation of star formation by feed- 
back from stellar winds and supernovae, and the merging of galaxy fragments. All these 
processes can be encoded in a set of simple, physically motivated rules. A surprisingly 
small number of free parameters are involved, and these are set by reference to a small 
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subset of local galaxy data. The result is a fully specified model of galaxy formation that 
can be used to make predictions for the properties of the high redshift universe. Semi- 
analytic models have been remarkably successful in reproducing or predicting a range of 
galaxy properties, such as the luminosity function, number counts and colours ||] [11] 



and the evolution of the Hubble sequence of morphological types [12] |l3|]. In particular 



predictions for the cosmic history of star formation [00] O] , were subsequently found to 



be in good agreement with observations [14|. 



The semi-analytic approach on its own provides little direct information about the 
spatial distribution of galaxies. To overcome this limitation we applied the semi-analytic 
galaxy formation rules to dark matter halos grown in high resolution N-body simulations, 
an approach previously employed to study the clustering of the local galaxy population 
|l5|| . The supercomputer simulations that we analyzed followed the formation of structure 
by gravitational instability in two cold dark matter (CDM) universes, one with the critical 
density, $7 = 1, and the other with present day density parameter fio = 0.3 (and h = 0.75). 
These span the range of viable models of structure formation of this type. Dark matter 
halos were identified in the simulations at redshift z = 3 and populated with visible 
galaxies according to the predictions of the semi-analytic model for halos of each mass [|j . 
In this way we constructed catalogues of the positions, velocities and spectrophotometric 
properties of all the galaxies present in the simulations, including the effect on the colours 
of the galaxies produced by absorption by intervening gas jRj]. The model galaxies were 
then "observed" in the same filters used by Steidel and collaborators, and mock Lyman- 
break galaxies were identified in exactly the same way as in the real survey ||. The 
spatial distribution of the dark matter and Lyman-break galaxies in our = 1 simulation 
is illustrated in Figure [l|. The top half of the picture shows a slice through the simulation 
at z = 3, 1.6 Gyr after the Big Bang in this cosmology for h = 0.5. Lyman-break galaxies 
are seen to have formed within the largest dark matter halos present at that epoch and 
to trace the densest ridges of the dark matter distribution. A redshift survey covering 
the central region of this slice would reveal a "spike" like that observed by Steidel and 
collaborators Q. The fate of the Lyman-break galaxies in this "spike" is illustrated in the 
bottom panel of Figure [l] which shows the same slice, but at the present day, 11.5 Gyr later. 
The material in the region of the spike has collapsed to form the large cluster seen near the 
centre of the image. This cluster has a mass of 9 x W 14 h~ 1 MQ, comparable to the mass of 
the Coma cluster |l7| , and contains the descendants of 60 objects originally identified as 
Lyman-break galaxies. The example in Figure |l| illustrates the tendency for Lyman-break 
galaxies to form preferentially in large dark matter halos. The dashed line in Figure ^(a) 
shows the predicted mass distribution of halos that hosted Lyman-break galaxies at z = 3. 
The solid lines give the corresponding distribution for the present day halos in which the 
Lyman-break galaxies ended up. According to the theory, today's descendants of Lyman- 
break galaxies exhibit a marked preference for dense environments, ranging from poor 
groups to rich clusters, with only a small fraction surviving as isolated galaxies. Thus, 
as conjectured by Steidel and co-workers ||, large concentrations of Lyman-break galaxies 
at high redshift pick out regions where proto-groups and proto-clusters are forming. Our 
simulations clearly show that the precursors of clusters are highly irregular and elongated, 
reflecting their ongoing formation by accretion of massive lumps at the intersection of long 
filaments. It is rather interesting that the field observed by Steidel and collaborators || 
contains a quasar, supporting the view that dense environments favour the formation of 



active galaxies [22]. 
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With our simulations we are able to calculate the probability of finding structures 
as large as the one discovered by Steidel and coworkers ||. To do this, we first located 
galaxies in cylindrical 'skewers' of cross-sectional area equal to the area of the survey, using 
the simulation volumes and their periodic replications to sample a unit redshift interval. 
We then smoothed the galaxy density field with a top hat filter of width Az = 0.04, equal 
to the binwidth in the actual survey. Finally, we obtained the frequency of distinct peaks 
with 3.4 time the mean number of Lyman-break galaxies, which is the over density of the 
most significant spike in the real survey. We find that 45% of all skewers contain at least 
one such spike in a unit redshift interval centred on z = 3 and a significant number contain 
2 or more. These numbers are very similar in the two CDM models we have examined. 
We conclude therefore that structures comparable to the spike recently discovered are 
not at all surprising in cold dark matter models irrespective of the detailed values of the 
cosmological parameters. 

A more quantitative comparison between theory and observations is provided by the 
correlation function of Lyman-break galaxies, a statistic which is, in principle, measur- 
able from the large surveys currently underway. Previous theoretical attempts to quantify 
clustering at high redshift have been restricted to dark matter halos [18|, [19|. With the 
semi-analytic approach, on the other hand, we are able to predict directly the clustering 
properties of the observable galaxies themselves. There are substantial differences be- 
tween the clustering statistics of dark halos and Lyman-break galaxies due, for example, 
to scatter in the relation between halo mass and galaxy luminosity and to the fact that 
about 10% of dark halos contain multiple Lyman-break galaxies (see Figure |2](b)). Our 
predicted correlation function for Lyman-break galaxies at redshift z = 3 is plotted in 
Figure |3[ Its amplitude is over 10 times larger than that of the underlying mass distribu- 
tion, reflecting the preferential formation of Lyman-break galaxies in dark matter halos of 
mass > 10 h Mq (which are rare at z — 3), and consistent with the analytic predictions 
of Q. The bias parameter (defined as the ratio of rms fluctuations in the galaxies and 
mass respectively in spheres of radius 8/i~ 1 Mpc) is b = 4.2 ± 0.4 for the O = 1 model and 
b = 2.5 ± 0.2 in the tt = 0.3 model. 

Cold dark matter models of galaxy formation predict that only a small fraction of the 
stars present today - perhaps as little as 5% - have formed by z ~ 3 [jTTf| . In spite of this, 
the observed abundance of Lyman-break galaxies can be accounted for in these models [0] , 
although the results depend sensitively on the amplitude of density fluctuations, the choice 
of stellar initial mass function and the possible effects of dust on galaxy luminosities. Thus, 
unless the effects of dust have been severely underestimated, the discovery of the Lyman- 
break galaxies indicates that the formation sites of most of the stars in our universe have 
now been identified. According to theory, the Lyman-break galaxies are the brightest, 
most massive, examples of the population of young galaxies present at z ~ 3. Because 
of this, they are expected to be clustered much more strongly than the underlying dark 
matter, an expectation that, as we have shown, is consistent with the detection of a large 
spike in their redshift distribution. These data provide incontrovertible evidence for biased 
galaxy formation. At lower redshifts, other surveys [21| generally pick out less extreme 
representatives of the galaxy population which tend to be less strongly biased than the 
Lyman-break galaxies are at z ~ 3. Using the theoretical approach that we have outlined 
here it is possible to relate data on galaxy clustering at various redshifts with theoretical 
expectations. Such comparisons, particularly with forthcoming surveys at high redshift, 
will test the model of galaxy formation in a cold dark matter universe. 
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Figure Captions 



Figure 1. (a) The distribution of mass and Lyman-break galaxies in a slice of an f] = 1 
CDM simulation at z = 3. The slice has side 50/i _1 Mpc and thickness 5/i _1 Mpc. The 
region plotted contains a large cluster at z = 0. The simulation was performed using a 
parallel tree-code with 3 million particles. The high resolution allows dark matter halos 
as small as that of the Large Magellanic Cloud to be resolved. The initial amplitude of 
density fluctuations was chosen so as to approximately reproduce the observed abundance 
of rich galaxy clusters today |^]. The logarithm of the dark matter density is colour- 
coded so that the highest density regions are white and the lowest density regions are 
black. The blue circles mark the locations where, according to our semi-analytic galaxy 
formation model, Lyman-break galaxies have formed. Whenever an individual dark matter 
halo contains more than one Lyman-break galaxy, the first one is placed at the centre of 
the halo and the rest at the positions of randomly selected halo particles. According to 
the semi-analytic model, there are 900 Lyman-break galaxies in the entire (50/i _1 Mpc) 3 
simulation volume. As expected from |0|, both our Q = 1 and J7o = 0.3 models produce an 
abundance of Lyman-break galaxies at high redshift consistent with the observed counts. 
The Lyman-break galaxies are preferentially found in the higher density regions and, as 
a result, they are strongly biased relative to the underlying mass distribution. (b) The 
same slice as in (a) evolved to the present day. The blue circles mark the positions of 
particles which were associated with Lyman-break galaxies at z = 3. The largest halo in 
the simulation volume has a mass comparable to that of the Coma cluster and contains 
the descendants of 60 Lyman-break galaxies. The general appearance of the f^o = 0.3 
simulation is qualitatively similar. 



Figure 2. (a) The mass distributions of dark matter halos that host Lyman-break galaxies 
at redshift 3 (dashed lines) and of halos that host their descendants at the present time 
(solid line). At high redshift, Lyman-break galaxies are preferentially located within the 
most massive halos that have formed at that time. These halos tend to merge together 
to form the most massive structures present at z = 0. Today, the descendants of Lyman- 
break galaxies are to be found in a range of environments, from groups to clusters, with 
only a small fraction surviving in smaller, isolated halos. (b) The number of Lyman break 
galaxies as a function of halo mass at z=3 (crosses). Around 10% of the halos that host 
Lyman break galaxies contain more than one of these objects. The circles show the number 
of Lyman break progenitors as a function of halo mass at z = 0. The most massive cluster 
in the simulation volume had 60 progenitors that were identified as Lyman break galaxies 
at z = 3. 

Figure 3. Left panel: Correlation functions, £(r), in the = 1 CDM simulation at z = 3. 
The abscissa gives the pair separation in comoving coordinates. The dashed line shows the 
true spatial correlation function of the Lyman-break galaxies. The thick solid line shows 
the apparent correlation function in redshift space, - the correlations are reduced on small 
scales because of smearing by random motions within virialized halos and are slightly 
enhanced on large scales due to coherent inflow onto collapsing structures. The dotted 
line shows the analytic prediction of |^] for the real-space correlation function. Overall, 
the agreement with the simulation result is very good. On small scales the simulation 
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results are slightly higher, reflecting small inaccuracies in the analytic model. On large 
scales the simulation results are slightly suppressed by finite box effects in our relatively 
small simulation volume. On these large scales the analytic results are more reliable. The 
thin solid line shows the cold dark matter correlation function at z = 3. Right panel: 
The redshift space correlation functions in our = 1 (SCDM, solid line) and 0,q = 0.3 
(OCDM dashed line) simulations. The error bars were obtained by bootstrap resampling. 
The comoving clustering length of Lyman-break galaxies (defined as the pair separation 
at which £(r) = 1) at z = 3 is comparable to the clustering length of bright galaxies 
today. Perhaps counterintuitively, the Lyman-break galaxy correlation functions in the 
two models are quite similar, with characteristic clustering lengths which differ by only 
~ 50%. (The inclusion of a non-zero cosmological constant makes very little difference to 
the results of an f?o ~ 0.3 model; cf. Fig 8 of Q). This is due in part to the normalization 
of the simulations which requires that they should approximately reproduce the observed 
abundance of clusters at the present day. 
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